Minimum spanning trees for gene expression data clustering.

نویسندگان

  • Y Xu
  • V Olman
  • D Xu
چکیده

This paper describes a new framework for microarray gene-expression data clustering. The foundation of this framework is a minimum spanning tree (MST) representation of a set of multi-dimensional gene expression data. A key property of this representation is that each cluster of the expression data corresponds to one subtree of the MST, which rigorously converts a multi-dimensional clustering problem to a tree partitioning problem. We have demonstrated that though the inter-data relationship is greatly simplified in the MST representation, no essential information is lost for the purpose of clustering. Two key advantages in representing a set of multi-dimensional data as an MST are: (1) the simple structure of a tree facilitates efficient implementations of rigorous clustering algorithms, which otherwise are highly computationally challenging; and (2) as an MST-based clustering does not depend on detailed geometric shape of a cluster, it can overcome many of the problems faced by classical clustering algorithms. Based on the MST representation, we have developed a number of rigorous and efficient clustering algorithms, including two with guaranteed global optimality. We have implemented these algorithms as a computer software EXCAVATOR. To demonstrate its effectiveness, we have tested it on two data sets, i.e., expression data from yeast Saccharomyces cerevisiae, and Arabidopsis expression data in response to chitin elicitation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ant-MST: An Ant-Based Minimum Spanning Tree for Gene Expression Data Clustering

We have proposed an ant-based clustering algorithm for document clustering based on the travelling salesperson scenario. In this paper, we presented an approach called Ant-MST for gene expression data clustering based on both ant-based clustering and minimum spanning trees (MST). The ant-based clustering algorithm is firstly used to construct a fully connected network of nodes. Each node repres...

متن کامل

Clustering Gene Expression Data with Memetic Algorithms based on Minimum Spanning Trees

With the invention of microarray technology, researchers are capable of measuring the expression levels of ten thousands of genes in parallel at various time points of the biological process. During the investigation of gene regulatory networks and general cellular mechanisms, biologists are attempting to group genes based on the time-depending pattern of the obtained expression levels. In this...

متن کامل

Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees

MOTIVATION Gene expression data clustering provides a powerful tool for studying functional relationships of genes in a biological process. Identifying correlated expression patterns of genes represents the basic challenge in this clustering problem. RESULTS This paper describes a new framework for representing a set of multi-dimensional gene expression data as a Minimum Spanning Tree (MST), ...

متن کامل

Performanace of Improved Minimum Spanning Tree Based on Clustering Technique

Clustering technique is one of the most important and basic tool for data mining. Cluster algorithms have the ability to detect clusters with irregular boundaries, minimum spanning tree-based clustering algorithms have been widely used in practice. In such clustering algorithms, the search for nearest objects in the construction of minimum spanning trees is the main source of computation

متن کامل

Algorithm for Clustering Gene Expression Data with Outliers Using Minimum Spanning Tree

Microarrays enable biologists to study genome-wide patterns of gene expression in any given cell type at any given time and under any given set of conditions. Identifying group of genes that manifest similar expression pattern is important in the analysis of gene expression in time series data. In this paper multidimensional gene expression data is represented using Minimum Spanning Tree (MST)....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genome informatics. International Conference on Genome Informatics

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2001